PageRank Computation, with Special Attention to Dangling Nodes
نویسندگان
چکیده
Abstract. We present a simple algorithm for computing the PageRank (stationary distribution) of the stochastic Google matrix G. The algorithm lumps all dangling nodes into a single node. We express lumping as a similarity transformation of G, and show that the PageRank of the nondangling nodes can be computed separately from that of the dangling nodes. The algorithm applies the power method only to the smaller lumped matrix, but the convergence rate is the same as that of the power method applied to the full matrix G. The efficiency of the algorithm increases as the number of dangling nodes increases. We also extend the expression for PageRank and the algorithm to more general Google matrices that have several different dangling node vectors, when it is required to distinguish among different classes of dangling nodes. We also analyze the effect of the dangling node vector on the PageRank, and show that the PageRank of the dangling nodes depends strongly on that of the nondangling nodes but not vice versa. At last we present a Jordan decomposition of the Google matrix for the (theoretical) extreme case when all web pages are dangling nodes.
منابع مشابه
A Fast Two-Stage Algorithm for Computing SimRank and Its Extensions
We present a fast two-stage algorithm for computing the PageRank vector [16]. The algorithm exploits the following observation: the homogeneous discrete-time Markov chain associated with PageRank is lumpable, with the lumpable subset of nodes being the dangling nodes [13]. Time to convergence is only a fraction of what’s required for the standard algorithm employed by Google [16]. On data of 45...
متن کاملTraps and Pitfalls of Topic-Biased PageRank
We discuss a number of issues in the definition, computation and comparison of PageRank values that have been addressed sparsely in the literature, often with contradictory approaches. We study the difference between weakly and strongly preferential PageRank, which patch the dangling nodes with different distributions, extending analytical formulae known for the strongly preferential case, and ...
متن کاملDetermining Factors Behind the PageRank Log-Log Plot
We study the relation between PageRank and other parameters of information networks such as in-degree, out-degree, and the fraction of dangling nodes. We model this relation through a stochastic equation inspired by the original definition of PageRank. Further, we use the theory of regular variation to prove that PageRank and in-degree follow power laws with the same exponent. The difference be...
متن کاملOn the Localization of the Personalized PageRank of Complex Networks
In this paper new results on personalized PageRank are shown. We consider directed graphs that may contain dangling nodes. The main result presented gives an analytical characterization of all the possible values of the personalized PageRank for any node.We use this result to give a theoretical justification of a recent model that uses the personalized PageRank to classify users of Social Netwo...
متن کاملApplication of Markov Chain in the PageRank Algorithm
Link analysis algorithms for Web search engines determine the importance and relevance of Web pages. Among the link analysis algorithms, PageRank is the state of the art ranking mechanism that is used in Google search engine today. The PageRank algorithm is modeled as the behavior of a randomized Web surfer; this model can be seen as Markov chain to predict the behavior of a system that travels...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- SIAM J. Matrix Analysis Applications
دوره 29 شماره
صفحات -
تاریخ انتشار 2007